AITopics | pruning rate

Collaborating Authors

pruning rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

Neural Information Processing SystemsJun-18-2026, 10:57:09 GMT

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates are repeated for each supported bit-width, resulting in a cost that scales linearly with the number of precisions. Additionally, extra fine-tuning stages are often required to support additional or intermediate precision options, further compounding the overall training burden. To address this issue, we propose two techniques that greatly reduce the training overhead without compromising model utility: (i) Weight bias correction enables shared batch normalization and eliminates the need for fine-tuning by neutralizing quantization-induced bias across bit-widths and aligning activation distributions; and (ii) Bit-wise coreset sampling strategy allows each child model to train on a compact, informative subset selected via gradient-based importance scores by exploiting the implicit knowledge transfer phenomenon. Experiments on CIFAR-10/100, TinyImageNet, and ImageNet-1K with both ResNet and ViT architectures demonstrate that our method achieves competitive or superior accuracy while reducing training time up to 7.88 . Our code is released at this link.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Experimental Results of Pruning Plasticity

Neural Information Processing SystemsApr-25-2026, 22:00:54 GMT

We also studied pruning plasticity on structured pruning. In particular, we choose the filter pruning method used in Li et al. [32]. The pruning criterion is the absolute weight sum of each nonzero filter and the regeneration criterion is the absolute gradient sum of each zero filter. We first pre-train four sets of neural networks from scratch with various structured sparsity, including 0, 0.10, 0.50, and 0.70, noted as "Pre-trained Sparsity" in the figure title. To measure the plasticity of these pre-trained models, we choose four different pruning rates noted as "Pruning rate" to remove filters from these pre-trained models.

artificial intelligence, machine learning, sparsity, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Ari Morcos, Haonan Yu, Michela Paganini, Yuandong Tian

Neural Information Processing SystemsFeb-13-2026, 08:59:54 GMT

Neural Information Processing Systems http://nips.cc/

dataset, initialization, ticket, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre:

Research Report (0.97)
Contests & Prizes (0.94)

Industry: Leisure & Entertainment > Gambling (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

7087c949df293f13c0052ac825936e6f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 17:46:02 GMT

accuracy, gradient norm, pruning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > North Carolina (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

83004190b1793d7aa15f8d0d49a13eba-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 04:44:30 GMT

iterative mag, test accuracy, ticket, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

Chiang, Hung-Yueh, Chang, Chi-Chih, Lu, Yu-Chen, Lin, Chien-Yu, Wu, Kai-Chiang, Abdelfattah, Mohamed S., Marculescu, Diana

arXiv.org Artificial IntelligenceDec-9-2025

Deploying large language models (LLMs) on mobile platforms faces significant challenges due to the limited memory and shared computational resources of the device. Resource availability may be an issue as it is directly impacted by the current device workload, adding to the uncertainty of model deployment. We introduce UniQL, a unified post-training quantization and low-rank compression framework with on-device configurable pruning rates for edge LLMs. UniQL is a general framework that integrates quantization and low-rank compression for Transformers, State Space Models (SSMs), and hybrid models to support diverse edge applications. In our proposed joint framework, we introduce an efficient structured weight-sorting method that speeds up computation by 20x, quantization-aware singular value decomposition (SVD) to minimize quantization errors, state-aware weight sorting for SSMs, and a fused rotary positional embedding (RoPE) kernel for pruned models. Our framework performs weight-sorting, fine-tuning, and quantization in the cloud in a single-pass workflow, while enabling on-device configurable pruning rates up to 35%. Our experiments show that quantized and pruned models achieve a memory reduction of 4x-5.7x and a token-throughput improvement of 2.7x-3.4x, maintaining accuracy within 5% of the original models at 15% pruning across Transformers (Llama3 and Qwen2.5), SSMs (Mamba2), and hybrid models (Nemotron-H and Bamba-v2). The code and quantized models are available at: https://github.com/enyac-group/UniQL.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.03383

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning

Lin, Chenqing, Hussien, Mostafa, Yu, Chengyao, Jing, Bingyi, Cheriet, Mohamed, Abdelrahman, Osama, Ming, Ruixing

arXiv.org Artificial IntelligenceNov-25-2025

Neural network pruning has been widely adopted to reduce the parameter scale of complex neural networks, enabling efficient deployment on resource-limited edge devices. Mainstream pruning methods typically adopt uniform pruning strategies, which tend to cause a substantial performance degradation under high sparsity levels. Recent studies focus on non-uniform layer-wise pruning, but such approaches typically depend on global architecture optimization, which is computational expensive and lacks flexibility. To address these limitations, this paper proposes a novel method named Flexible Automatic Identification and Removal (FAIR)-Pruner, which adaptively determines the sparsity levels of each layer and identifies the units to be pruned. The core of FAIR-Pruner lies in the introduction of a novel indicator, Tolerance of Differences (ToD), designed to balance the importance scores obtained from two complementary perspectives: the architecture-level (Utilization Score) and the task-level (Reconstruction Score). By controlling ToD at preset levels, FAIR-Pruner determines layer-specific thresholds and removes units whose Utilization Scores fall below the corresponding thresholds. Furthermore, by decoupling threshold determination from importance estimation, FAIR-Pruner allows users to flexibly obtain pruned models under varying pruning ratios. Extensive experiments demonstrate that FAIR-Pruner achieves state-of-the-art performance, maintaining higher accuracy even at high compression ratios. Moreover, the ToD based layer-wise pruning ratios can be directly applied to existing powerful importance measurements, thereby improving the performance under uniform-pruning.

artificial intelligence, machine learning, pruning, (15 more...)

arXiv.org Artificial Intelligence

2508.02291

Country: North America > Canada (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

TETRIS: TilE-matching the TRemendous Irregular Sparsity

Yu Ji, Ling Liang, Lei Deng, Youyang Zhang, Youhui Zhang, Yuan Xie

Neural Information Processing SystemsNov-20-2025, 18:03:19 GMT

Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost. Although pruning makes the model smaller, it is difficult to get a practical speedup in modern computing platforms such as CPU and GPU due to the irregularity.

machine learning, natural language, sparsity, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)

Add feedback

Discrimination-aware Channel Pruning for Deep Neural Networks

Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu

Neural Information Processing SystemsNov-20-2025, 16:32:59 GMT

Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels.

artificial intelligence, feature map, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback